Institute of Molecular Life Sciences

نویسنده

  • Mark D. Robinson
چکیده

A recent slew of ENCyclopedia Of DNA Elements (ENCODE) Consortium publications, specifically the article signed by all Consortium members, put forward the idea that more than 80% of the human genome is functional. This claim flies in the face of current estimates according to which the fraction of the genome that is evolutionarily conserved through purifying selection is less than10%. Thus, according to the ENCODE Consortium, a biological function can be maintained indefinitely without selection, which implies that at least 80! 101⁄470% of the genome is perfectly invulnerable to deleterious mutations, either because no mutation can ever occur in these “functional” regions or because no mutation in these regions can ever be deleterious. This absurd conclusion was reached through various means, chiefly by employing the seldom used “causal role” definition of biological function and then applying it inconsistently todifferentbiochemicalproperties,bycommittinga logical fallacyknownas“affirmingtheconsequent,” by failing to appreciate the crucial difference between “junk DNA” and “garbage DNA,” by using analytical methods that yield biased errorsand inflateestimatesof functionality,by favoringstatistical sensitivityover specificity, andbyemphasizingstatistical significance rather than the magnitude of the effect. Here, we detail the many logical and methodological transgressions involved in assigning functionality to almost every nucleotide in the human genome. The ENCODE results were predicted by one of its authors to necessitate the rewriting of textbooks. We agree,many textbooks dealing with marketing, mass-media hype, and public relations may well have to be rewritten. Key words: junk DNA, genome functionality, selection, ENCODE project. “Data is not information, information is not knowledge, knowledge is not wisdom, wisdom is not truth,” —Robert Royar (1994) paraphrasing Frank Zappa’s (1979) anadiplosis “I would be quite proud to have served on the committee that designed the E. coli genome. There is, however, no way that I would admit to serving on a committee that designed the human genome. Not even a university committee could botch something that badly.” —David Penny (personal communication) “The onion test is a simple reality check for anyone who thinks they can assign a function to every nucleotide in the human genome. Whatever your proposed functions are, ask yourself this question: Why does an onion need a genome that is about five times larger than ours?” —T. Ryan Gregory (personal communication) Early releases of the ENCyclopedia Of DNA Elements (ENCODE) were mainly aimed at providing a “parts list” for the human genome (ENCODE Project Consortium 2004). The latest batch of ENCODE Consortium publications, specifically the article signed by all Consortium members (ENCODE Project Consortium 2012), has much more ambitious interpretative aims (and a much better orchestrated public relations campaign). The ENCODE Consortium aims to convince its readers that almost every nucleotide in the human genome GBE ! The Author(s) 2013. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. 578 Genome Biol. Evol. 5(3):578–590. doi:10.1093/gbe/evt028 Advance Access publication February 20, 2013 at Zentibliothek on A ril 7, 2013 http://gberdjournals.org/ D ow naded rom recently evolved species-specific elements. We recognize these difficulti s, but it would be ridiculous to assume that 70+% of the human genome consists of elements under undetectable selection, especially given other pieces of evidence, such as mutational load (Knudson 1979; Charlesworth et al. 1993). Hence, the proportion of the human genome that is functional is likely to be larger to some extent than the approximately 9% for which there exists some evidence for selection (Smith et al. 2004), but the fraction is unlikely to be anything even approaching 80%. Finally, we would like to emphasize that the fact that it is sometimes difficult to identify selection should never be used as a justification to ignore selection altogether in assigning functionality to parts of the human genome. ENCODE adopted a strong version of the causal role definition of function, according to which a functional element is a discrete genome segment that produces a protein or an RNA or displays a reproducible biochemical signature (e.g., protein binding). Oddly, ENCODE not only uses the wrong concept of functionality, it uses it wrongly and inconsistently (see below). Using the Wrong Definition of “Fun tionali y” Wrongly Estimates of functionality based on conservation are likely to be, well, conservative. Thus, the aim of the ENCODE Consortium to identify functions experimentally is, in principle, a worthy one. We have already seen that ENCODE uses an evolution-free definition of “functionality.” Let us for the sake of argument ssu that there is nothing wrong with this practice. Do they u e he concept of causal r le functi n properly? According to ENCODE, for a DNA segment to be ascribed functionality it needs to 1) be transcribed, 2) be associated with a modified histone, 3) be located in an openchromatin area, 4) bind a transcription factor, or 5) contain a methylated CpG dinucleotide. We note that most of these properties of DNA do not describe a function; some describe a particular genomic location or a feature related to nucleotide composition. To turn these properties into causal role functions, the ENCODE authors engage in a logical fallacy known as “affirming the consequent.” The ENCODE argument goes like this: 1. DNA segments that “function” in a particular biological process (e.g., regulating transcription) tend to display a certain “property” (e.g., transcription factors bind to them). 2. A DNA segment displays the same “property.” 3. Therefore, the DNA segment is “functional.” (More succinctly: if function, then property; thus, if property, therefore function.) This kind of argument is false because a DNA segment may display a property without necessarily manifesting the putative function. For example, a random sequence may bind a tra scri tion factor, but that may not result in transcription. The ENCODE authors apply this flawed reasoning to all their functions. Is 80% of the Genome Functional? Or Is It 100%? Or 40%? No Wait . . . So far, we have seen that as far as functionality is concerned, ENCODE used the wrong definition wrongly. We must now address the question of consistency. Specifically, did ENCODE use the wrong definition wrongly in a consistent manner? We do not think so. For example, the ENCODE authors singled out transcription as a function, as if the passage of RNA polymerase through a DNA sequence is in some way more meaningful than other functions. But, what about DNA polymerase and DNA replication? Why make a big fuss about 74.7% of the genome that is transcribed, and yet ignore the fact that 100% of the genome takes part in a strikingly “reproducible biochemical signature”—it replicates! Actually, the ENCODE authors could have chosen any of a number of arbitrary percentages as “functional,” and . . . they did! In their scientific publications, ENCODE promoted the idea that 80% of the human genome was functional. The scientific commentators followed, and proclaimed that at least 80% of the genome is “active and needed” (Kolata 2012). Subsequently, one of the lead authors of ENCODE admitted that the press conference mislead people by claiming that 80% of our genome was “essential and useful.” He put that number at 40% (Gregory 2012), although another lead author reduced the fraction of the genome that is devoted to function to merely 20% (Hall 2012). Interestingly, even when a lead author of ENCODE reduced the functional genomic fraction to 20%, he continued to insist that the term “junk DNA” needs “to be totally expunged from the lexicon,” inventing a new arithmetic according to which 20%> 80%. In its synopsis of the year 2012, the journal Nature adopted the more modest estimate, and summarized the findings of ENCODE by stating that “at least 20% of the genome can influence gene expression” (Van Noorden 2012). Science stuck to its maximalist guns, and its summary of 2012 repeated the claim that the “functional portion” of the human genome equals 80% (Anonymous 2012). Unfortunately, neither 80% nor 20% are based on actual evidence. The ENCODE Incongruity Armed with the proper concept of function, one can derive expectations concerning the rates and patterns of evolution of functional and nonfunctional parts of the genome. The surest indicator of the existence of a genomic function is that losing it has some phenotypic consequence for the organism. Countless natural experiments testing the functionality of every region of the human genome through mutation have taken place over millions of years of evolution in our ancestors and close relatives. As most mutations in functional regions Graur et al. GBE 580 Genome Biol. Evol. 5(3):578–590. doi:10.1093/gbe/evt028 Advance Access publication February 20, 2013 at Zentibliothek on A ril 7, 2013 http://gberdjournals.org/ D ow naded rom has a function and that these functions can be maintained indefinitely without selection. ENCODE accomplishes these aims mainly by playing fast and loose with the term “function,” by divorcing genomic analysis from its evolutionary context and ignoring a century of population genetics theory, and by employing methods that consistently overestimate functionality, while at the same time being very careful that these estimates do not reach 100%. More generally, the ENCODE Consortium has fallen trap to the genomic equivalent of the human propensity to see meaningful patterns in random data—known as apophenia (Brugger 2001; Fyfe et al. 2008)—that have brought us other “codes” in the past (Witztum 1994; Schinner 2007). Three papers have already commented critically on aspects of the ENCODE inferences (Eddy 2012; Niu and Jiang 2013; Bray and Pachter 2013), but without addressing the issues exclusively from an evolutionary genomics perspective. In the following, we shall dissect several logical, methodological, and statistical improprieties involved in assigning functionality to almost every nucleotide in the genome. We shall only deal with a single article (ENCODE Project Consortium 2012) out of more than 30 that have been published since the 6 September 2012 release. We shall also refer to three commentaries, one written by a scientist and two written by Science journalists (Ecker 2012; Pennisi 2012a, 2012b), all trumpeting the death of “junk DNA.” “Selected Effect” and “C usal Rol ” Functio s The ENCODE Project Consortium assigns function to 80.4% of the genome (ENCODE Project Consortium 2012). We disagree with this estimate. However, before challenging this estimate, it is necessary to discuss the meani g of “functi n” and “functionality.” Like many words in the English language, these terms have numerous meanings. What meaning, then, should we use? In biology, there are two main concepts of function: the “selected effect” and “causal role” concepts of function. The “selected effect” concept is historical and evolutionary (Millikan 1989; Neander 1991). Accordingly, for a trait, T, to have a proper biolo ical function, F, it is necessary and (almos ) sufficient that the following two conditions hold: 1) T originated as a “reproduction” (a copy or a copy of a copy) of some prior trait that performed F (or some function similar to F) in the past, and 2) T exists because of F (Millikan 1989). In other words, the “selected effect” function of a trait is the effect for which it was selected, or by which it is maintained. In contrast, the “causal role” concept is ahistorical and nonevolutionary (Cummins 1975; Amundson and Lauder 1994). That is, for a trait, Q, to have a “causal role” function, G, it is necessary and sufficient that Q performs G. For clarity, let us use the following illustration (Griffiths 2009). There are two almost identical sequences in the genome. The first, TATA AA, has been maintained by natural selection to bind a transcription factor; hence, its selected effect function is to bind this transcription factor. A second sequence has arisen by mutation and, purely by chance, it resembles the first sequence; therefore, it also binds the transcription factor. However, transcription factor binding to the second sequence does not result in transcription, that is, it has no adaptive or maladaptive consequence. Thus, the second sequence has no selected effect function, but its causal role function is to bind a transcription factor. The causal role concept of function can lead to bizarre outcomes in the biological sciences. For example, while the selected effect function of the heart can be stated unambiguously to be the pumping of blood, the heart may be assigned many additional causal role functions, such as adding 300 g to body weight, producing sounds, and preventing the pericardium from deflating onto itself. As a result, most biologists use the selected effect co cept of f nction, following the Dobzhanskyan dictum according to which biological sense can only be derived from evolutionary context. We note that the causal role concept may sometimes be useful; mostly as an ad hoc device for traits whose evolutionary history and underlying biology are obscure. This is obviously not the case with DNA sequences. The main advantage of the selected-effect function definition is that it suggests a clear and conservative method of inference for function in DNA sequences; only sequences that can be shown to be under selection can be claimed with any degree of confidence to be functional. The selected effect definition of function has led to the discovery of many new functions, for example, microRNAs (Lee et al. 1993), and to the rejection of putative functions, for example, numts (Hazkani-Covo et al. 2010). From an evolutionary viewpoint, a function can be assigned to a DNA sequence if and only if it is possible to destroy it. All functional entities in the universe can be rendered nonfunctional by the ravages of time, entropy, mutation, and what have you. Unless a genomic functionality is actively protected by s lection, it will accumulate deleterious mutations and will cease to be functional. The absurd alternative, which unfortunately was adopted by ENCODE, is to assume that no deleterious mutations can ever occur in the regions they have deemed to be functional. Such an assumption is akin to claiming that a television set left on and unattended will still be in working condition after a million years because no natural events, such as rust, erosion, static electricity, and earthquakes can affect it. The convoluted rationale for the decision to discard evolutionary conservation and constraint as the arbiters of functionality put forward by a lead ENCODE author (Stamatoyannopoulos 2012) is groundless and self-serving. Of course, it is not always easy to detect selection. Functional sequences may be under selection regimes that are difficult to detect, such as positive selection or weak (statistically undetectable) purifying selection, or they may be “Function” and the Evolution-Free Gospel of ENCODE GBE Genome Biol. Evol. 5(3):578–590. doi:10.1093/gbe/evt028 Advance Access publication February 20, 2013 579 at Zentibliothek on A ril 7, 2013 http://gberdjournals.org/ D ow naded rom Institute of Molecular Life Sciences 26.06.13 Epigenomics, Mark D. Robinson Page 29 Epigenomics – Part 1: Intro to epigene/cs – Part 2: High-­‐throughput technologies – Part 3: Computa?onal methods Mark D. Robinson, Statistical Genomics, IMLS Institute of Molecular Life Sciences 26.06.13 Epigenomics, Mark D. Robinson Page 30 Overview of this part – Goal: highlight where informa/cs approaches are being used, insights into (a subset of) bioinforma/cs research related to epigenomics – Methods for individual plaVorms – DNA methyla/on • (BS-­‐microarray) Illumina 450k array • (Affinity capture) BATMAN + new methods – Peak/region detec/on • MACS • Copy number and MBD/ChIP-­‐seq – Methods for integra/ng mul/ple data types • ChromHMM, Segway, ENCODE SOM “donuts” • Clustering -­‐ Repitools Institute of Molecular Life Sciences Type II (1 probe) Type I (2 probes) Bisulphite conversion + “genotyping” array (Illumina HumanMethylaton450) Unmethylated CpG site Methylated CpG site from Bibikova et al. Genomics 2011 beta = M / (M+U+e) Institute of Molecular Life Sciences 450k arrays: probe-­‐type bias 26.06.13 Epigenomics, Mark D. Robinson Page 32 Overall, very good correspondence between 450k platform and others (e.g. BS-seq) Normalization issues for different probe types (current research) Institute of Molecular Life Sciences 450k array data Mark D. Robinson, IMLS, UZH Page 33 Very different behaviour of Type I and Type II probes Institute of Molecular Life Sciences (Maksimovic et al. 2012) SWAN: Subset-quantile Within Array Normalization quantile normalization based on the number of CpG sites -outcome: makes Infinium I and II beta values distributions more similar SWAN minfi

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Commentary: Modification of Host Responses by Mycobacteria

1 Molecular Infection and Functional Biology Laboratory, Kusuma School of Biological Sciences, Indian Institute of Technology-Delhi, New Delhi, India, 2 School of Life Sciences, Jawaharlal Nehru University, New Delhi, India, 3 Inflammation Biology and Cell Signaling Laboratory, National Institute of Pathology, New Delhi, India, 4 Jamia Hamdard, Institute of Molecular Medicine, New Delhi, India,...

متن کامل

Effects of N-acetylcysteine on life shortening induced by chronic low dose-rate gamma-ray exposure in mice

Background: The development of methods to alleviate radiation-induced health effects is important for the practical use of radiation therapy and for understanding the molecular mechanisms mediating these effects. Here, we examined the protective capability of N-acetylcysteine (NAC) on life-shortening effects induced by continuous low dose-rate gamma-ray exposure in mice. Materials and Methods: ...

متن کامل

Molecular Characterization of the Epstein-Barr Virus BGLF2 Gene, its Expression, and Subcellular Localization

Background: Epstein–Barr virus (EBV) is a universal herpes virus which can cause a life-long and largely asymptomatic infection in the human population. However, the exact pathogenesis of the EBV infection is not well known.Objective: A comprehensive bioinformatics prediction was carried out for investigating the molecular properties of the BGLF2 and to a...

متن کامل

Distinct modes of centromere protein dynamics during cell cycle progression in 1 Drosophila S 2 R + cells 2 3 4

1 Drosophila S2R+ cells 2 3 4 Peter V. Lidsky , Frank Sprenger, Christian F. Lehner 5 6 7 1) Institute of Molecular Life Sciences (IMLS), University of Zurich, Zurich, Switzerland 8 2) Institute of Genetics, University of Regensburg, Regensburg, Germany 9 3) Current address: Institute of Gene Biology, Russian Academy of Sciences, Moscow, 10 Russia. 11 12 13 *) corresponding author: Christian Le...

متن کامل

Unique Transcriptional Signatures in the Mouse Brain

Affiliations: Department of Biological Chemistry, Silberman Institute of Life Sciences, 8 The Edmond and Lily Safra Center for Brain Sciences, Department of Genetics, 9 Silberman Institute of Life Sciences, The Hebrew University of Jerusalem, Jerusalem, 10 91904, Israel. Department of Immunology, Weizmann Institute of Science, Rehovot, 11 Israel. Department of Life Sciences, Ben-Gurion Universi...

متن کامل

Editorial: Glial Plasticity in Depression

1 Life and Health Sciences Research Institute, School of Health Sciences, University of Minho, Braga, Portugal, 2 Life and Health Sciences Research Institute/3B’s—PT Government Associate Laboratory, Braga/Guimarães, Portugal, DIGARC, Polytechnic Institute of Cávado and Ave, Barcelos, Portugal, 4 Faculty of Medicine, Center for Neuroscience and Cell Biology, University of Coimbra, Coimbra, Portu...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013